-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(meta): record barrier latency in event table #13633
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #13633 +/- ##
=======================================
Coverage 68.15% 68.15%
=======================================
Files 1519 1519
Lines 261508 261526 +18
=======================================
+ Hits 178230 178246 +16
- Misses 83278 83280 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -76,7 +76,7 @@ pub struct Reschedule { | |||
/// [`Command`] is the action of [`crate::barrier::GlobalBarrierManager`]. For different commands, | |||
/// we'll build different barriers to send, and may do different stuffs after the barrier is | |||
/// collected. | |||
#[derive(Debug, Clone)] | |||
#[derive(Debug, Clone, strum::Display)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits: the inner fields of some enum item could be extremely large, I'm not sure if we should pick and only display some important info manually only for them. It's acceptable for me to keep it this way since only latest 10 events will be recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strum::Display only include the name. See example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I am a little bit suspicious about this PR. Because other event kinds are truely critical failures, while this one is more like a metric to me, even not worth a INFO-level log.
|
Btw, I have once discussed with @BugenZhao about whether we can introduce a mini metrics store in Meta node, to keep some important metrics (e.g. barrier latency, fragment throughput, fragment backpressure, etc.) for dashboard without need of Prometheus. If your motivation is to view the metrics without Prometheus, I think this might be better than recording |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
part of #12826
Failed barrier doesn't write a event log.
Event log table only records latest 10 events for each type by default. We can customize it on demand.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.